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The Korean translated Appraisal of Guidelines for Research and Evaluation II (Korean AGREE 
II) instrument was distributed into Korean medical societies in 2011 . However, inter-rater 
disagreement issues still exist. The Korean AGREE II scoring guide was therefore developed 
to reduce inter-rater differences. This study examines the effects of the Korean AGREE II 
scoring guide to reduce inter-rater differences. Appraisers were randomly assigned to two 
groups (Scoring Guide group and Non-Scoring Guide group). The Korean AGREE II 
instrument was provided to both groups. However, the scoring guide was offered to 
Scoring Guide group only. Total 14 appraisers were participated and each guideline was 
assessed by 8 appraisers. To evaluate the reliability of the Korean AGREE II scoring guide, 
correlation of scores among appraisers and domain-specific intra-class correlation (ICC) 
were compared. Most scores of two groups were comparable. Scoring Guide group showed 
higher reliability at all guidelines. They showed higher correlation among appraisers and 
higher ICC values at almost all domains. The scoring guide reduces the inter-rater 
disagreement and improves the overall reliability of the Korean-AGREE II instrument. 
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INTRODUCTION 

Knowledge translation strategies for evidence-based clinical 
decision-making are embodied in Clinical practice guidelines 
(CPGs). In Korea, more than 100 CPGs have been developed in 
the last decade, and the types and development of CPGs are in- 
creasing (1-3). Nevertheless, there have been no discussions on 
the scientific methodology of guideline development or an ap- 
praisal for the developed CPGs. Moreover, the quality manage- 
ment of CPGs such as accreditation by prestigious institutions 
differs according to the organization developing the CPGs. 

The purpose of an appraisal was to enhance the quality of in- 
formation that CPGs provide to decision-makers and recom- 
mendations that have been developed using critically evaluated 
high-quality processes (4). For this purpose, in Europe and Nor- 
th America, where CPGs are actively applied, quality manage- 
ment of CPGs is performed using standardized tools or outlin- 
ing the requirements developers must follow during the devel- 
opment process (5). In Korea, the Korean Academy of Medical 
Sciences (KAMS), the federation of professional societies of 
medical sciences in Korea, established a center for appraisal of 



clinical practice guideline in 2013 and took a major role in qual- 
ity management of CPGs through the peer assessment by using 
the Appraisal of Guidelines for Research & Evaluation (AGREE) 
instrument . 

AGREE instrument is a tool that assesses the methodological 
rigor and transparency by which a CPG is developed (6). The 
original AGREE instrument was developed in 2003 in collabo- 
ration with researchers from 13 countries. In 2009, AGREE II 
was produced, improving the reliability, validity, and perfor- 
mance of appraisal. In Korea, a translation of the original AGREE 
instrument was introduced in 2006. In 2010, AGREE II was trans- 
lated into Korean and distributed to Korean medical societies. 
The usefulness of AGREE II have been verified through various 
quality assessment studies of specific diseases (8-11), interna- 
tional comparison of the level of guideline development (12, 
13), and overall quality assessment of CPGs developed in spe- 
cific countries (14-17). 

However, differences in the developmental environments, 
health care systems, and medical cultures across the countries 
make it difficult to apply AGREE II uniformly to all assessments. 
Consequently, AGREE II is simply a comprehensive reference, 
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not any details of the proposed standards (7). In addition, Kore- 
an medical societies did not have enough experience develop- 
ing CPGs and applying AGREE II. These limitations have led to 
obstacles appraising CPGs, including a lack of consensus among 
appraisers and a significant score variability. That is, there is an 
inter-rater disagreement. 

Therefore, the Korean AGREE II Scoring Guide (hereafter, 
Scoring Guide) was developed to reduce these inter-rater dif- 
ferences (7). By reflecting the characteristics of the developmen- 
tal environment of Korea, it provides detailed evaluation crite- 
ria for each item. It is expected to reduce the variation in scores 
among appraisers and presents a desirable target level for the 
development of CPGs. 

This study examined the effects of the Scoring Guide, with an 
emphasis on reducing inter-rater differences and improving as- 
sessment reliability. 



Clinical practice guidelines 

Two CPGs that were officially submitted for appraisal to the Ko- 
rean medical guideline information center (KoMGI) were ap- 
praised (Table 1). The KoMGI recognizes Korean AGREE II as 
the only official tool for appraisal, but the Scoring Guide was 
applied for this study with the consent of the developers. 

Statistical analysis 

A descriptive analysis of the distribution of domain scores ac- 
cording to the group was performed. To evaluate the reliability 
of the Korean AGREE II Scoring Guide, domain-specific intra- 
class correlations (ICCs) were calculated. The consistency of 
scores among appraisers was assessed by analyzing the correla- 
tion between the scores within the group. The statistical pro- 
gram SPSS for Windows 18.0 (SPSS, Chicago, IL, USA) was used. 
P values of 0.05 or less were considered significant. 



MATERIALS AND METHODS 



RESULTS 



AGREE II 

AGREE II consists of 23 key items organized within six domains 
followed by two global rating items ("Overall assessment"). Each 
item is rated on a 7-point scale (1 = strongly disagree to 7 = strong- 
ly agree). Each domain captures a unique dimension of guide- 
line quality. AGREE II recommends that each guideline be as- 
sessed by at least two appraisers, and preferably four, to increase 
the reliability of the assessment. 



Distribution of scores 

Compared to the Non-Scoring Guide group, the distributions of 
the domain-specific scores in Scoring Guide group were char- 
acterized by low scores and low deviation for both CPGs. The 
differences were notable for domain 2 (Stakeholder involve- 
ment), domain 3 (Rigor of development), and domain 5 (Appli- 
cability) (Fig. 1). However, the difference was not statistically 
significant. 



Scoring guide 

The Scoring Guide was developed in accordance with Korean 
AGREE II. The first draft established requirements for anchor 
points 1, 3, 5, and 7 for each of the Korean AGREE II items and 
presented a specific checklist for each anchor point. Final agree- 
ment was derived through a modified Delphi consensus pro- 
cess. Thirteen specialists participated in the process and the 
modified Delphi was conducted twice. 



Reliability 

Inter- rater reliability was analyzed by comparing the ICCs be- 
tween the groups. The Scoring Guide groups had higher ICCs 
for both CPGs and this was common in all domains, except do- 
main 4 (Clarity of presentation) and domain 6 (Editorial inde- 
pendence). Domain 3 and domain 5 were particularly signifi- 
cant. The overall scores were significantly higher in the Scoring 
Guide groups for both CPGs (Tables 2, 3). 



Assessment and appraisal 

Appraisers were randomly assigned to two groups (Scoring Guide 
group and Non-Scoring Guide group). The Korean AGREE II 
instrument was provided to both groups. However, the scoring 
guide was offered to Scoring Guide group only. Total 14 apprais- 
ers were participated and each guideline was assessed by 8 ap- 
praisers. 



Consistency 

The Scoring Guide groups showed higher correlations among 
appraisers for both CPGs (Tables 4, 5). 

DISCUSSION 

Although quality management policy regarding CPG develop- 
ment varies widely across the countries, a strict quality assess- 



Table 1. Clinical practice guidelines appraised in this study 



Guideline 


Publish year 


Edition Developer 


Developmental 
method 


Cancer pain management guideline 


2012 


5th National Cancer Center 


Adaptation 


Guideline 201 2 for Gastroesophageal reflux disease (GERD) 


2012 


3rd The Korean Society of Neurogastroenterology and Motility 


Adaptation 
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Fig. 1. Distribution of Korean AGREE II domain scores according to the use of Scoring Guide for Cancer pain management guideline (A) and GERD guideline (B). Black boxs are 
Scoring guide group and white boxs are Non-Scoring Guide group. The top and bottom of the box indicates the 75th (Q3) and 25th percentile (Q1), respectively, and the horizon- 
tal line in the box means the 50th percentile (the median). The upper and lower ends of the whisker represent Q3+1 ,5x(interquartile range), and Q1 -1 ,5x(interquartile range), 
respectively. 



Table 2. Inter-rater reliability of Korean AGREE II instrument domain scores for Cancer pain management guideline 



Domain 




Scoring guide group 






Non-scoring guide group 




ICC 


(95% CI) 


P value 


ICC 


(95% CI) 


P value 


Domain 1 


0.815 


(-0.344-0.995) 


0.046 


0.682 


(-1.307-0.992) 


0.116 


Domain 2 


0.430 


(-3.137-0.986) 


0.251 


-0.762 


(-11.791-0.955) 


0.595 


Domain 3 


0.722 


(0.175-0.938) 


0.011 


0.473 


(-0.565-0.882) 


0.121 


Domain 4 


0.718 


(-1 .048-0.993) 


0.096 


-0.296 


(-8.411-0.967) 


0.503 


Domain 5 


0.424 


(-1.925-0.960) 


0.229 


0.273 


(-2.693-0.950) 


0.312 


Domain 6 


0.000 


(-16.443-0.999) 


0.391 


0.000 


(-16.443-0.999) 


0.391 


Overall 


0.826 


(0.671-0.918) 


< 0.001 


0.680 


(0.395-0.850) 


< 0.001 


ICC, Intra-class correlation; CI, Confidence interval. 










Table 3. Inter-rater reliability of Korean AGREE 


instrument domain scores for GERD guideline 








Domain 




Scoring guide group 






Non-scoring guide group 




ICC 


(95% CI) 


P value 


ICC 


(95% CI) 


P value 


Domain 1 


0.821 


(-0.303-0.995) 


0.043 


-0.333 


(-0.806-0.966) 


0.512 


Domain 2 


0.769 


(-0.675-0.994) 


0.068 


0.769 


(-0.679-0.994) 


0.069 


Domain 3 


0.796 


(0.394-0.954) 


0.002 


0.424 


(-0.710-0.871) 


0.155 


Domain 4 


-1.333 


(-15.940-0.941) 


0.670 


0.000 


(-6.260-0.975) 


0.422 


Domain 5 


0.888 


(0.431-0.992) 


0.005 


0.272 


(-2.696-0.950) 


0.312 


Domain 6 


0.667 


(-4.814-1.000) 


0.182 


0.792 


(-2.634-1 .000) 


0.116 


Overall 


0.869 


(0.753-0.939) 


< 0.001 


0.662 


(0.362-0.841) 


< 0.001 



ICC, Intra-class correlation; CI, Confidence interval. 



ment based on the AGREE II instrument is a common feature. 
This means that a thorough understanding and proper applica- 
tions of AGREE II are essential for quality management of CPGs. 
In Korea, AGREE II was translated and distributed to medical 
societies in 2011. Nevertheless, inter- rater disagreement issues 
still exist. So, the Scoring Guide was developed to reduce inter- 
rater differences. This study examined the effects of the Scoring 



Guide on the reduction of inter-rater differences. 

Most scores for the two groups were comparable, and the 
Scoring Guide groups showed higher reliability for both CPGs. 
They showed a stronger correlation among appraisers and high- 
er ICC values for most domains, especially domain 2 (Stake- 
holder involvement), domain 3 (Rigor of development), and 
domain 5 (Applicability). 
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Table 4. Association among the appraisers according to use of Scoring Guide with Cancer pain management guideline 



Group 




Scoring guide group 






Non-scoring guide group 




Appraiser 


1 


2 


3 


4 


1 


2 


3 


4 


1 


1 


0.622 


0.348 


0.481 


1 


-0.225 


-0.434 


-0.459 


2 


0.622 


1 


0.393 


0.596 


-u.zzo 


1 


0.373 




3 


0.348 


0.393 


1 


-0.052 


-0.434 


0.373 


1 


0.833 


4 


0.481 


0.596 


-0.052 


1 


-0.459 


0.502 


0.833 


1 


Boldface are statistically significant at the P< 0.05 level. 














Table 5. Association among the appraisers according to use of Scoring Guide with GERD guideline 










Group 




Scoring guide group 






Non-scoring guide group 




Appraiser 


1 


2 


3 


4 


1 


2 


3 


4 


1 


1 


0.853 


0.453 


0.491 


1 


0.556 


0.441 


0.127 


2 


0.853 


1 


0.651 


0.749 


0.556 


1 


0.479 


0.128 


3 


0.453 


0.651 


1 


0.641 


0.441 


0.479 


1 


0.372 


4 


0.491 


0.749 


0.641 


1 


0.127 


0.128 


0.372 


1 



Boldface are statistically significant at the P< 0.05 level. 

To better understand the results, the characteristics of the ap- 
praisers' environment, which affect their decisions, must be 
considered (11, 18-20). In the Korean healthcare environment, 
stakeholder involvement is an unfamiliar concept. Participa- 
tion of the patients and citizen is seldom guaranteed, and even 
if they participated, their power and rights are generally weak 
(21, 22). As a result, only experts are recognized as stakeholders. 
This lack of stakeholder involvement experience in health poli- 
cy and various definitions of stakeholder among appraisers are 
thought to have influenced the results. In this study, we found 
that the Scoring Guide reduced the gap in experience and un- 
derstanding among appraisers by providing clear standards re- 
garding the stakeholder and level of participation. 

Another distinct domain is applicability. Applicability evalu- 
ates whether facilitators and barriers to its implementation and 
the potential resource implications of applying the recommen- 
dations have been considered. Strategies used to promote the 
implementation of CPGs are diverse, and the effect of applica- 
tion differs depending on the user's environment (23-25). Since 
there are few or no implementation strategies or efforts to pro- 
mote the implementation level in Korea, differences in aware- 
ness and environments across appraisers are thought to affect 
inter-rater differences. In this respect, the Scoring Guide that 
provides specific criteria related to resources and methodology 
for measuring the level of implementation could complement 
the gaps in experience and awareness among appraisers. 

This study suggests that the low reliability across appraisers 
arises from the effects of the healthcare environment and char- 
acteristics of the appraisers, rather than the validity of the Kore- 
an AGREE II instrument itself. Using these findings, we can find 
ways to overcome those limitations, and expand the use of evi- 
dence-based CPGs in Korea. 

Low reliability was noticeable for low-quality CPG develop- 
ment. This means that enhancing CPG developmental compe- 



tency is the first step in CPG quality management, to improve 
the reliability of appraisers. Therefore, priority should be placed 
on the development of high-quality CPGs, and developers should 
be provided with tools and programs for developing CPGs. Two 
guidebooks related to de novo and adaptation methods have 
been developed and disseminated, but few societies are aware 
of these manuals and active implementation strategies are in- 
sufficient (26). Guidebooks and programs to aid guideline de- 
velopment using scientific methodology and covering compre- 
hensive developmental processes could be an effective strategy. 
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